Andrew Ba Tran
3/10/2018
Find yourself repeating the same tasks over and over again?
Example:

sb <- read.csv("https://docs.google.com/spreadsheets/d/1gH6eUQVQsEmFagy0qzQDEuwb3cutMWaddCLc7ESbzjc/pub?gid=294374511&single=true&output=csv", stringsAsFactors=F)
(You can bring in a Google Sheet if you publish as a CSV and copy the link over)
pop <- read.csv("https://docs.google.com/spreadsheets/d/16oW_uvRJCNoOnCeAkJH4fDouFokjaGUdGFUCaFdKd6I/pub?output=csv", stringsAsFactors=F)
Which ones match?
library(tidyverse)
sb_adjusted <- left_join(sb, pop, by=c("State_Abbreviation"="Abbrev"))
library(knitr)
kable(head(sb_adjusted, 3))
| State_Abbreviation | Starbucks | State | Population |
|---|---|---|---|
| AK | 42 | Alaska | 741894 |
| AL | 65 | Alabama | 4863300 |
| AR | 37 | Arkansas | 6931071 |
sb_adjusted$per_capita <- sb_adjusted$Starbucks/sb_adjusted$Population*100000
kable(head(sb_adjusted, 3))
| State_Abbreviation | Starbucks | State | Population | per_capita |
|---|---|---|---|---|
| AK | 42 | Alaska | 741894 | 5.661186 |
| AL | 65 | Alabama | 4863300 | 1.336541 |
| AR | 37 | Arkansas | 6931071 | 0.533828 |
Establish some rules.
All data sets you want to join with the population data set:
# Save the dataframe as a consistent name
any_df <- sb
# Rename the first column to "Abbrev"
colnames(any_df)[1] <- "Abbrev"
# Join by the similar name
df_adjusted <- left_join(any_df, pop, by="Abbrev")
# Do the calculations based on the values in the second column
df_adjusted$per_capita <- df_adjusted[,2] / df_adjusted$Population * 100000
kable(head(df_adjusted, 3))
| Abbrev | Starbucks | State | Population | per_capita |
|---|---|---|---|---|
| AK | 42 | Alaska | 741894 | 5.661186 |
| AL | 65 | Alabama | 4863300 | 1.336541 |
| AR | 37 | Arkansas | 6931071 | 0.533828 |
Turn your lines of code into a function by wrapping it with
function(arg1, arg2, ... ){ and }
Remember how there were two types of State ID data? Full name and abbreviations. We can write the function so you can tell it to join based on what type it should join by.
pc_adjust <- function(any_df, state_type){
pop <- read.csv("https://docs.google.com/spreadsheets/d/16oW_uvRJCNoOnCeAkJH4fDouFokjaGUdGFUCaFdKd6I/pub?output=csv", stringsAsFactors=F)
# State type options are either "Abbrev" or "State"
colnames(any_df)[1] <- state_type
df_adjusted <- left_join(any_df, pop, by=state_type)
df_adjusted$per_capita <- df_adjusted[,2] / df_adjusted$Population * 1000000
return(df_adjusted)
}
kable(head(sb, 3))
| State_Abbreviation | Starbucks |
|---|---|
| AK | 42 |
| AL | 65 |
| AR | 37 |
test <- pc_adjust(sb, "Abbrev")
kable(head(test, 3))
| Abbrev | Starbucks | State | Population | per_capita |
|---|---|---|---|---|
| AK | 42 | Alaska | 741894 | 56.61186 |
| AL | 65 | Alabama | 4863300 | 13.36541 |
| AR | 37 | Arkansas | 6931071 | 5.33828 |
Alright, we've got it working with Starbucks data.
Let's try it with Dunkin' Donuts data.
dd <- read.csv("https://docs.google.com/spreadsheets/d/1TWuWZpfDUMWmMpc7aPqUQ-g1a1J0rUO8_cle_zcPyI8/pub?gid=1983903926&single=true&output=csv", stringsAsFactors=F)
kable(head(dd))
| State | Dunkin |
|---|---|
| Alabama | 18 |
| Alaska | 0 |
| Arizona | 59 |
| Arkansas | 7 |
| California | 2 |
| Colorado | 8 |
The state identification is spelled out this time and not abbreviations.
Fortunately, we accounted for that when making the formula.
dd_adjusted <- pc_adjust(dd, "State")
kable(head(dd_adjusted))
| State | Dunkin | Abbrev | Population | per_capita |
|---|---|---|---|---|
| Alabama | 18 | AL | 4863300 | 3.7011905 |
| Alaska | 0 | AK | 741894 | 0.0000000 |
| Arizona | 59 | AZ | 2988248 | 19.7440105 |
| Arkansas | 7 | AR | 6931071 | 1.0099449 |
| California | 2 | CA | 39250017 | 0.0509554 |
| Colorado | 8 | CO | 5540545 | 1.4439013 |
pc_adjust() is your tiny perfect function

File > New Project > New Directory > R Package
One word. Some tips on figuring out the best name.
R/ folder where you save your function code - more detailsDESCRIPTION file for package metadata - more detailsNAMESPACE file, which is only necessary if you're submitting to CRAN - more detailsQuestions about which License to use? Check out the options.
Also, notice that I added Imports: dplyr because this function won't work without the left_join function from dplyr.
Copy and paste the pc_adjust function you made into a new script file.
pc_adjust <- function(any_df, state_type){
pop <- read.csv("https://docs.google.com/spreadsheets/d/16oW_uvRJCNoOnCeAkJH4fDouFokjaGUdGFUCaFdKd6I/pub?output=csv", stringsAsFactors=F)
# State type options are either "Abbrev" or "State"
colnames(any_df)[1] <- state_type
df_adjusted <- left_join(any_df, pop, by=state_type)
df_adjusted$per_capita <- df_adjusted[,2] / df_adjusted$Population * 1000000
return(df_adjusted)
}
Name the file after the function, pc_adjust and save it into the R/ folder
Go back to your pc_adjust.R script and add these lines above the code.
#' Population adjuster
#'
#' This function appends state population data
#' @param any_df The name of the dataframe you want to append to
#' @param state_type if state identification is abbreviations, use "Abbrev" if full state name, use "State"
#' @keywords per capita
#' @import dplyr
#' @export
#' @examples
#' pc_adjust(dataframe, "Abbrev")
These special comments above the function will be compiled into the correct format.
Watch.
Run these lines in console.
install.packages("roxygen2")
library(roxygen2)
roxygenise()
It wrote to the NAMESPACE file and created a pc_adjust.Rd file based on the special comments.
This would've been tough to put together by hand
Press Cmd + Shift + B to build the package.
Just run
install.packages("whateveryoucalledyourpackage")
and you can run pc_adjust whenever you want.
Type
?pc_adjust
This is what your special comments above your R function helped generate.

This means you have to add some clean documentation, such as a readme.MD file.
install.packages("devtools")
library(devtools)
install_github("andrewbtran/abtnicarr")
library(abtnicarr)
From Giora Simchoni:
Keep adding functions to your package.
Perhaps, create a Shiny version of it for those who don't use R.
Over time you'll build up a bunch that you'll rely on over and over again.
If it's awesome, submit it to CRAN.
This was an extremely simple version of making a package.
For better details, check out the free book from Hadley Wikham.